This assignment has been completed by Rahul Paul Gopireddy (#801307911) and Aryan Reddy Baddam (#801311891)¶

The insights will focus on trends and patterns within the Netflix catalog that could inform content acquisition, production, and marketing strategies.

Domain User: Content Strategist or Market Analyst in the Entertainment Industry

Insight 1: Distribution and Popularity of TV Shows and Movies Over the Last 20 Years We'll first analyze the distribution and popularity of TV shows and movies on Netflix over the last 20 years. This includes exploring the change in the number of productions over time and the average duration of content.

In [2]:
import matplotlib.pyplot as plt
import pandas as pd

# Load the Netflix dataset
df = pd.read_csv('./data/netflix_titles.csv')

# Function to extract numeric value from duration
def extract_duration(duration):
    if isinstance(duration, str):  # Check if the duration is a string
        return int(duration.split()[0])
    return None  # Return None if duration is not a string

# Convert duration to numeric
df['duration_numeric'] = df['duration'].apply(extract_duration)

# Filter data for the last 20 years
current_year = pd.Timestamp.now().year
last_20_years_df = df[df['release_year'] >= current_year - 20]

# Separate TV shows and movies
tv_shows_df = last_20_years_df[last_20_years_df['type'] == 'TV Show']
movies_df = last_20_years_df[last_20_years_df['type'] == 'Movie']

# Plotting
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.scatter(tv_shows_df['release_year'], tv_shows_df['duration_numeric'], alpha=0.5)
plt.xlabel('Release Year')
plt.ylabel('Duration (Seasons)')
plt.title('TV Shows (Last 20 Years)')

plt.subplot(1, 2, 2)
plt.scatter(movies_df['release_year'], movies_df['duration_numeric'], alpha=0.5)
plt.xlabel('Release Year')
plt.ylabel('Duration (Minutes)')
plt.title('Movies (Last 20 Years)')

plt.tight_layout()
plt.show()
In [ ]:
 

Insight 2: Exploring Content Release by Country and Year Next, we'll explore which countries have been the most prolific in terms of releasing content on Netflix and how this has changed over time.

In [3]:
import plotly.express as px

# Using Plotly to create an interactive scatter plot
fig = px.scatter(last_20_years_df, x='release_year', y='country', color='type',
                 title='Netflix Content Release by Country and Year',
                 labels={'release_year': 'Release Year', 'country': 'Country'})

fig.show()
In [ ]:
 

Insight 3: Directors' Impact - Number of Titles and Release Years Finally, we'll examine the impact of directors by looking at the number of titles they've released and their distribution over the years.

In [4]:
from ipywidgets import interact, Dropdown

# Create a list of unique directors
directors = last_20_years_df['director'].dropna().unique()

def plot_movies_by_director(selected_director):
    director_movies = last_20_years_df[last_20_years_df['director'] == selected_director]
    if director_movies.empty:
        print(f"No data available for {selected_director}.")
        return

    plt.figure(figsize=(16, 8))

    # Plotting the count of movies by director
    plt.subplot(2, 1, 1)
    plt.bar(director_movies['director'], len(director_movies), color='skyblue')
    plt.ylabel('Number of Movies')
    plt.title(f"Movies Directed by {selected_director}")

    # Plotting the release years of movies by director
    plt.subplot(2, 1, 2)
    plt.scatter(director_movies['title'], director_movies['release_year'], color='orange')
    plt.ylabel('Release Year')
    plt.xlabel('Movie Title')
    plt.title(f"Release Years of Movies Directed by {selected_director}")
    plt.xticks(rotation=45)

    plt.tight_layout()
    plt.show()

# Create interactive dropdown menu
interact(plot_movies_by_director, selected_director=Dropdown(options=directors))
interactive(children=(Dropdown(description='selected_director', options=('Kirsten Johnson', 'Julien Leclercq',…
Out[4]:
<function __main__.plot_movies_by_director(selected_director)>
In [ ]: